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Abstract — We introduce an algorithm for the efficient computation 
of the continuous Haar transform of 2D patterns that can be described 
by polygons. These patterns are ubiquitous in VLSI processes where 
they are used to describe design and mask layouts. There, speed is of 
paramount importance due to the magnitude of the problems to be 
solved and hence very fast algorithms are needed. We show that by 
techniques borrowed from computational geometry we are not only 
able to compute the continuous Haar transform directly, but also to 
do it quickly. This is achieved by massively pruning the transform 
tree and thus dramatically decreasing the computational load when 
the number of vertices is small, as is the case for VLSI layouts. We 
call this new algorithm the pruned continuous Haar transform. We 
implement this algorithm and show that for patterns found in VLSI 
layouts the proposed algorithm was in the worst case as fast as its 
discrete counterpart and up to 12 times faster. 

Keywords-Hsksa: transform; piecewise constant; 2D polygons; VLSI 
design; 

I. Introduction 

The Haar transform (HT) is often a tool of choice in image 
processing due to its edge detection property, low complexity and 
the simplicity of its implementation. It is particularly suited for 
piecewise constant functions that have a very sparse and accurate 
representation in the Haar domain. An important class of two- 
dimensional (2D) piecewise constant functions is the class of 
functions described by a union of disjoint polygonal subsets of 
R^. Such a description is often used in different areas of image 
processing such as contour detection, segmentation, tomography 
image reconstruction [1] or for VLSI layouts description [2]. A 
polygonal shape is usually described by an ordered list of its 
vertices. This description has the advantage of being very compact 
and natural to understand. Many algorithms in computational 
geometry make efficient use of this description to solve various 
problems like intersections of polygons, area computations or 
point inclusion [3]. However, this description has no fixed length 
which makes it more cumbersome for use in other applications in 
image processing including machine learning, pattern matching or 
measuring similarity. The Haar transfoiTn provides such a fixed- 
length representation. 

Optical lithography is the process that allows mass production 
of VLSI circuits [4]. The HT has been used so far in lithogra- 
phy to compress the Fourier precompensation filters for electron 
beam lithography [5] and also to regularize the obtained mask in 
inverse lithography [6]. More recently, Kryszczuk et al. introduced 
the direct printability prediction of VLSI layouts using machine 
learning techniques [7]. They use fixed-length feature vectors 
from orthogonal transforms and train a classifier to predict the 
printability of VLSI layouts without having to go through the 



detailed, and thus computationally expensive, simulation of the 
physical process of lithography. The HT is a perfect candidate to 
provide features in that case due to its close match to the polygons 
found in VLSI layouts. However, to obtain these features one has 
first to perform the transform of the enormous amount of data 
contained in modern VLSI layouts. It is thus crucial to have a very 
fast algorithm to yield HT coefficients from the vertex description 
of the polygons. 

The most straightforward way would be to first create a discrete 
image by sampling the polygons, and then use the discrete Haar 
transform (DHT) on the resulting image. However, the polygons 
describe an inherently continuous function, which allows us to 
compute the continuous Haar transform (CHT) coefficients instead. 
By using techniques borrowed from computational geometry to 
compute the inner products with the CHT basis functions, we are 
able to massively prune the transform flow-diagram in addition to 
avoiding sampling completely. This leads to a dramatic decrease 
of the computational load when the number of vertices is small. 
We call this new algorithm pruned continuous Haar transform 
(PCHT). The outputs of the DHT and the PCHT are identical for 
2D polygonal patterns. The PCHT was concretely implemented in 
a lithography tool and proved to have significantly lower runtime 
compared to the DHT for the particular case of rectilinear polygons 
from VLSI design layouts. 

The main contribution of this work is PCHT, a fast algorithm, 
and to the best of our knowledge the first of its kind for the CHT of 
2D piecewise constant polygonal patterns. Its efficiency compared 
to the DHT for the case of VLSI layouts is demonstrated with 
potential high impact for pattern matching techniques envisioned 
in computational lithography. 

In Section II, we will first briefly introduce the signal model 
we consider, namely 2D piecewise constant polygonal patterns. In 
Section III, we give a reminder on the ID and 2D Haar transform 
while Section IV presents the PCHT algorithm. The results of the 
application to VLSI design layouts are given in Section V and we 
conclude in Section VI. 

II. Signal Model 

The signal model we consider is one of 2D piecewise constant 
polygonal patterns. This is the class of images described by the 
union of a finite number of disjoint simple polygons. Mathemati- 
cally, a polygon is described by a set of points called vertices. A 
polygon P C is typically defined by its boundary, which is a 
collection of straight segments, called edges. The polygon contains 
all the points inside the boundary. The description of a polygon is 
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Figure 1. Example of polygons. Polygon A is a simple polygon. Polygon 
B is a rectilinear polygon such as those found in VLSI layouts. 
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the list of its K vertices, ordered clockwise: 

{{xo,yo),. . . ,{xK-i,yK-i)} , {xi,y^) € R^. 

where any two successive vertices describe one edge of the bound- 
ary. In addition, simple polygons have the property that no two 
edges intersect each other. A subclass of simple polygons termed 
rectilinear comprises all those with only right angles and is the 
building block of VLSI layouts. An example of such polygons is 
shown in Figure 1. 

A 2D piecewise constant polygonal pattern is described by 
a collection of M disjoint polygons each with an associated 
weight {("Pi, ^ and with disjoint interiors, i.e. Int ('Pi) PI 

Int {Vj) = Vi 7^ j, where Int(P) is the interior of polygon V, 
and Wi € R. Finally the continuous image model is 



(1) 



1 if {x,y) e V 



where we use an indicator fiuiction lp(a;, j/) 
and otherwise. 

III. The Haar Transform 

The ID Haar basis is an orthonormal basis on [0, T), composed 
of the family of functions 



k = 0,. 



where 
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The functions ip and are respectively defined as 

1 if < t < 



■ ^ ifO<a;<l 
otherwise 



-1 if i < t < 1 
otherwise 



As the 2D Haar basis is separable, we can define it in terms of 
the ID basis. Now we want to work over a surface T = [0,Tx) x 
[0, Ty) we call a tile. The scaling function is given by: 



'fij,k^,ky{x,y) = 'fij^k!ix)'fiXkJiy) 



(2) 



where j = kx = ky = 0. The other basis functions are given by 
the possible combinations of ip and ijj, one of them being: 
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(3) 



Figure 2. A pruned signal flow of the ID CHT. The full flow-diagram is 
shown in light grey. The signal transformed is f{t) = u{t — 3), defined 
on [0,8), where u{t) is the Heaviside function. Xj^k = (/ , <Pj^k) and 
Cj,k = (/, V'j.fe)- 



where j £ N is the scale and kx, fcj, £ {0, . . . , 2-' — 1} the shifts 
in the x and y directions respectively. Similarly, we get i'f'^l 
and f, ■ The first and second letter in the superscript indicate 
which basis function is used for x and y directions respectively, h 
and / indicate tji and ip respectively. 

Given the basis functions defined in (2) and (3), we derive the 
Haar transform as the inner product between the function / to 
transform and the Haar basis functions. Using the L^{T) and the 
1^{T) inner products, with a discretized tile T and discretized basis 
functions, we respectively get the dyadic continuous and discrete 
transforms [8]. The CHT and DHT coefficients are identical for 
2D piecewise constant polygonal patterns. 

Both the CHT and the DHT can be computed using the 

fast orthogonal wavelet transform (FWT) [9]. This algorithm is 
constructed using the two-scale relationships that link the basis 
functions at different scales: 

= V2^g„>fi{2t - n), V W = h„^i2t - n), 

n n 

where g„ and h„ are the taps of two discrete-time filters [8]. 
The Haar filters are defined as g„ — [2^^^^ 2^^^^] and h„ = 
J2-1/2 _2-i/2j jjjjg results in a Cooley-Tukey butterfly structure 
[10] where only the inner products with the scaling function at the 
lowest level need be computed. The full flow diagram for a length-8 
ID transform is shown in light grey in Figure 2. 

Using the separability of the 2D transform and the two-scale re- 
lationships, we obtain the relations between the 2D basis functions 
of the different scales. For example, i>j'^kl,k ^ written as 
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i{x,y). 



By replacing h„gm in the sum by g-ngm, gnhm and hnhm we 
obtain ipj,k^,ky, %!ll,ky and V'j^l'l'.fe,, respectively. As in the ID 
case, these relations induce a 2D butterfly structure. Therefore 
transform coefficients can be computed as a linear combination 
of inner products with the scaling function at different scales. 




Figure 3. From left to right: Examples of 1024mnxl024mii tUes from 
respectively Ml, M2 and CA layers. 

IV. Pruned Continuous Haar Transform 

A. Algorithm Derivation 

Let us consider the FWT described in Section III. First, using 
the signal model from (1) and the linearity of the inner product, 
we can decompose the transform into a sum of inner products of 
individual polygons with basis functions: 

M-1 
i=0 

Thus, from now on we consider only the transform of a single 
polygon. The second idea is to use computational geometry tech- 
niques to compute the inner product. The continuous inner product 
between the indicator of a polygon and the scaling function is the 
area of the geometrical intersection of the polygon and the support 
of the scaling function, multiplied by 2^ / ^T^Ty. 

The Haar transform acts as a discontinuity detector and all the 
transform coefficients will be zero except for basis functions that 
intersect the boundary of the polygon. As a consequence, the 
basis functions completely inside or outside a polygon yield a 
zero inner product. Moreover, all the coefficients below such a 
basis function in the transform tree are also zero (see Figure 2). 
Therefore the transform can be written as a divide-and-conquer 
algorithm. Divide the tile in four rectangular parts recursively until 
the part considered is completely inside or outside the polygon. 
Pseudocode for the PCHT is given in Algorithm 1, in which 

Tj,k^,ky = [k^n/V ,{k^ + l)Ty/V)x[kyTylV ,{ky + l)Ty/V) 

is the support of (pj,k^,ky An example of the pruned transform 
flow-diagram for the ID case is shown in black in Figure 2. In 
order to compute all the transform coefficients, the algorithm is 
called in the following way: 

M-1 

Xo,o,o = so ^ PCHT {Vi, 0, 0, 0, soWi) , 

where Xj^k.^,ky = {.f , 'p3,k-^,ky) and so = Xj -jT^Ty. The 
transform coefficients are C^°'^l^ky = (/> '^tkLky)- 

V. Application to VLSI Layouts 

We now show a practical example of the application of this 
algorithm to compute the CHT coefficients of a VLSI layout. 
In practice a layout is described using a vector format such as 
OASIS [2]. Compared to general simple polygons, those found 
in VLSI layouts have two additional properties. All vertices are 
placed on the integer grid and all edges are parallel to either the x 
or y axis. We call these polygons rectilinear. The routine used to 
compute the intersection area in Algorithm 1 is specifically adapted 



Algorithm 1 VCm:{V,j,k^,ky,s) 

Require: The polygon to transform V, the scale j, the shifts h, 

and ky and the scaling factor s. 
Ensure: c'^hi) ^^(ih) ^(j(hh) ^.^^^^^^ jjje transform coefficients. 

1: i IntersectionArea {V,Tj^k^^ky) 

2: if i = or i = T^Tyli?^ or j = J then 

3: Return i 

4: end if 

5: x^VQYLT{V,j + \,1k-^,2ky,2s) 

6: t/ ^ PCHT(P,i + 1, 2fe + 1, 2ky, 2s) 

1: z*r- PCHT^-P, J 1, 2k^, 2ky + 1, 2s) 

8: t PCHTCP, j -I- 1, 2fe -|- 1, 2ky + 1, 2s) 

9: a X — y 
10: h-^ x + y 
11: c-i^ z — t 
12: d ^ z + i 

13: , ^ , + s(a + c) 

14: Cfl\ ^Cfl\ +s(h-d) 

cttky^ci::lky^<'^--) 

16: Return h + d 



for rectilinear polygons from classical computational geometry 
techniques [3]. 

We implemented the PCHT and DHT algorithms in a compu- 
tational lithography tool that we ran on a 3GHz Intel Xeon 5450 
running Linux in 64-bit mode. All the code is C-H-, single-threaded 
and was compiled using GCC 4.1.2 with option "-03". The DHT 
was custom implemented taking into account the knowledge that 
the input image is binary. For both transforms the output coeffi- 
cients were discarded instead of being stored in order to minimized 
the impact of memory transfers on the runtime measurements. We 
ran a benchmark of the PCHT and the DHT on different layers of a 
22nm VLSI layout. Layers Ml and M2 are metal layers that contain 
both rectangles and other polygons, while the contact array (CA) 
layer contains only rectangles. Examples of tiles from the different 
layers are shown in Figure 3. Figure 4 shows the runtime as a 
function of the number of vertices K in 1024nmx 1024nm tiles 
from the Ml layer, the worst configuration for the PCHT in our 
experiment. The empirical distribution of the number of vertices 
in the tiles is shown in light grey. Although the runtime of the 
continuous transform grows with it outperforms its discrete 
counterpart for about half the tiles. The peaks around K = 190 
are caused by the very low number of tiles in that range, as shown 
by the empirical distribution, and all these tiles having a worse than 
average runtime. The DHT also shows a slight dependence on K 
due to the time needed to create the discrete image. The average 
speed-up of the runtime as a function of the tile size is shown 
in Figure 5. The speed-up is defined as the ratio of the runtimes 
of the DHT and pruned CHT. Because it has the highest vertex 
density, the Ml layer shows the least improvement, between 1 and 
3 times speed-up. For large tiles, layers M2 and CA show speed-up 
over 6 and 12 times, respectively. In a practical scenario where the 
output coefficients need to be stored for further use we expect the 
PCHT to outperform even more the DHT as its pruned structure 
avoids completely the computation of zero coefficients and thus 
their storage and associated memory transfers. On the other hand. 
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Figure 4. Median runtime of the PCHT (dashed line) and DHT (plain 
line) of 1024nmx 1024nm tiles from the Ml layer containing K vertices. 
Tiles with K > 200 have a lower runtime because they contain exclusively 
rectangles which are less complex. The empirical distribution of the number 
of vertices is shown in grey. 



the DHT has no knowledge of which coefficients will be zero and 
thus either tries to store every output coefficients or an if statement 
can be used if we know in advance that most coefficients will be 
zero, as is the case here. In addition, further optimizations such as 
parallelization and optimization for the cache size are possible. 

VI. Conclusions 

We introduced the PCHT, a new algorithm for the computation 
of the CHT of 2D polygonal patterns. We showed significant speed- 
up compared to the DHT in an implementation targeting rectilinear 
polygons found in VLSI layouts. We expect the PCHT to impact 
machine learning techniques being developed for application in 
computational lithography, such as printability prediction [7], as 
well as in the VLSI design process in general. 

The natural next step for our work would be to analyze the 
computational complexity of PCHT in order to validate theoret- 
ically its superiority over the DHT. The performance of both 
algorithins implementations should also be reassessed when the 
coefficients are stored in memory in order to account for the impact 
of memory transfers. Another important implementation step is 
the parallelization of the code as current and future increases in 
computation power come primarily through multi-core chips. The 
recursive nature of the PCHT algorithm makes it a perfect candidate 
for parallelization. 

We also intend to apply our vertex based approach to other 
transforms such as the continuous Fourier series. A fast algorithm 
to compute the Fourier series would also be very valuable in 
lithography where the fast Fourier transform is routinely used in 
optical lithography process simulation despite the introduction of 
aliasing due to the infinite bandwith of the polygons. Moreover, 
we expect an optimized vertex based fast continuous Fourier series 
algorithm to be inherently faster than the fast Fourier transform. 
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